Questions

  • Who commits misconduct?
  • Is there any punishment for misconduct?
  • What are the most common types of misconduct?
  • Can we predict future complaints for particular officers?
In [1]:
%matplotlib inline

import pandas as pd
import matplotlib
import numpy as np
import matplotlib.pyplot as plt
In [2]:
df = pd.read_csv("data.csv")
df.head()
Out[2]:
unique_mos_id first_name last_name command_now shield_no complaint_id month_received year_received month_closed year_closed ... mos_age_incident complainant_ethnicity complainant_gender complainant_age_incident fado_type allegation precinct contact_reason outcome_description board_disposition
0 10004 Jonathan Ruiz 078 PCT 8409 42835 7 2019 5 2020 ... 32 Black Female 38.0 Abuse of Authority Failure to provide RTKA card 78.0 Report-domestic dispute No arrest made or summons issued Substantiated (Command Lvl Instructions)
1 10007 John Sears 078 PCT 5952 24601 11 2011 8 2012 ... 24 Black Male 26.0 Discourtesy Action 67.0 Moving violation Moving violation summons issued Substantiated (Charges)
2 10007 John Sears 078 PCT 5952 24601 11 2011 8 2012 ... 24 Black Male 26.0 Offensive Language Race 67.0 Moving violation Moving violation summons issued Substantiated (Charges)
3 10007 John Sears 078 PCT 5952 26146 7 2012 9 2013 ... 25 Black Male 45.0 Abuse of Authority Question 67.0 PD suspected C/V of violation/crime - street No arrest made or summons issued Substantiated (Charges)
4 10009 Noemi Sierra 078 PCT 24058 40253 8 2018 2 2019 ... 39 NaN NaN 16.0 Force Physical force 67.0 Report-dispute Arrest - other violation/crime Substantiated (Command Discipline A)

5 rows × 27 columns

In [3]:
list(df.columns)
Out[3]:
['unique_mos_id',
 'first_name',
 'last_name',
 'command_now',
 'shield_no',
 'complaint_id',
 'month_received',
 'year_received',
 'month_closed',
 'year_closed',
 'command_at_incident',
 'rank_abbrev_incident',
 'rank_abbrev_now',
 'rank_now',
 'rank_incident',
 'mos_ethnicity',
 'mos_gender',
 'mos_age_incident',
 'complainant_ethnicity',
 'complainant_gender',
 'complainant_age_incident',
 'fado_type',
 'allegation',
 'precinct',
 'contact_reason',
 'outcome_description',
 'board_disposition']

Who commits misconduct?

In [4]:
def rank(df, column, countup='unique_mos_id'):
    new_df = df.groupby(column).count().reset_index()[[column, countup]]
    new_df.columns = [column, 'count']
    new_df = new_df.sort_values('count', ascending=False)
    
    return new_df
In [5]:
officer_df = df[['unique_mos_id', 'first_name', 'last_name', 'rank_now', 'mos_ethnicity', 'mos_gender']]
officer_df.drop_duplicates(inplace=True)
<ipython-input-5-23196165539a>:2: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  officer_df.drop_duplicates(inplace=True)
In [6]:
rank(officer_df, 'mos_gender')
Out[6]:
mos_gender count
1 M 3609
0 F 387
In [7]:
rank(officer_df, 'mos_ethnicity')
Out[7]:
mos_ethnicity count
4 White 2056
3 Hispanic 1154
2 Black 597
1 Asian 186
0 American Indian 3
In [8]:
rank(officer_df, 'rank_now')
Out[8]:
rank_now count
6 Police Officer 1711
3 Detective 1014
7 Sergeant 813
5 Lieutenant 314
0 Captain 63
2 Deputy Inspector 35
1 Chiefs and other ranks 28
4 Inspector 18
In [9]:
num_complaint_df = rank(df, 'unique_mos_id', countup='first_name')
In [10]:
num_bins = 75

fig, ax = plt.subplots()

n, bins, patches = ax.hist(num_complaint_df['count'], num_bins)

ax.set_xlabel('Officers')
ax.set_ylabel('Complaints')

ax.set_title(r'Histogram of Complaints per Cop')

# Tweak spacing to prevent clipping of ylabel
fig.tight_layout()
plt.show()

10% of cops are responsible for 70% of complaints.

In [11]:
num_bins = 75

fig, ax = plt.subplots()

# the histogram of the data
n, bins, patches = ax.hist(
    num_complaint_df['count'], num_bins, cumulative=True, density=True, stacked=True)

ax.set_xlabel('Officers')
ax.set_ylabel('Complaints')

ax.set_title(r'CDF of Complaints per Cop')

# Tweak spacing to prevent clipping of ylabel
fig.tight_layout()
plt.show()
In [12]:
precinct_df = rank(df, 'precinct')
precinct_df.precinct = precinct_df.precinct.map(lambda x: int(x))
precinct_df.head(10)
Out[12]:
precinct count
47 75 2172
46 73 1163
27 44 1139
29 46 1120
40 67 1119
51 79 1062
23 40 1009
49 77 950
30 47 944
74 120 844

The top precinct in particular (75, East NY - Brooklyn) is twice as bad as the next.

In [13]:
import folium

nyc_coor = [40.75,-73.8759]

# instatiate a folium map object with the above coordinate at center
complaint_map = folium.Map(location=nyc_coor,zoom_start=10)

# the path to the geojson file of the manhattan precincts
pathgeo = './precincts.geojson'

# make the chorlopleth map
complaint_map.choropleth(geo_data=pathgeo,
                        data=precinct_df,
                        columns=['precinct', 'count'],
                        key_on='feature.properties.Precinct',
                        fill_color='BuPu', 
                        fill_opacity=0.7, 
                        line_opacity=0.2,
                        legend_name='Complaints per precinct')
# show the map
complaint_map
/usr/local/lib/python3.8/site-packages/folium/folium.py:411: FutureWarning: The choropleth  method has been deprecated. Instead use the new Choropleth class, which has the same arguments. See the example notebook 'GeoJSON_and_choropleth' for how to do this.
  warnings.warn(
Out[13]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Is there any punishment for misconduct?

In [14]:
set(df.board_disposition)
Out[14]:
{'Exonerated',
 'Substantiated (Charges)',
 'Substantiated (Command Discipline A)',
 'Substantiated (Command Discipline B)',
 'Substantiated (Command Discipline)',
 'Substantiated (Command Lvl Instructions)',
 'Substantiated (Formalized Training)',
 'Substantiated (Instructions)',
 'Substantiated (MOS Unidentified)',
 'Substantiated (No Recommendations)',
 'Unsubstantiated'}
In [15]:
board_results_df = rank(df, 'board_disposition')
board_results_df
Out[15]:
board_disposition count
10 Unsubstantiated 15448
0 Exonerated 9609
1 Substantiated (Charges) 3796
6 Substantiated (Formalized Training) 1033
2 Substantiated (Command Discipline A) 964
4 Substantiated (Command Discipline) 851
3 Substantiated (Command Discipline B) 789
5 Substantiated (Command Lvl Instructions) 454
7 Substantiated (Instructions) 248
9 Substantiated (No Recommendations) 165
8 Substantiated (MOS Unidentified) 1

75% of the time there's no punishment at all.

In [16]:
(15448 + 9609) / sum(board_results_df['count'])
Out[16]:
0.7511541459320104

How long do cases stay open?

In [17]:
duration_df = df[['month_received', 'year_received', 'month_closed', 'year_closed']]

duration_df['months_open'] = 12 * (duration_df.year_closed - duration_df.year_received) + \
    (duration_df.month_closed - duration_df.month_received)

duration_df.months_open.mean()
<ipython-input-17-3dc263e7efd0>:3: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  duration_df['months_open'] = 12 * (duration_df.year_closed - duration_df.year_received) + \
Out[17]:
9.733767012410816

What are the most common types of misconduct?

In [18]:
rank(df, 'fado_type')
Out[18]:
fado_type count
0 Abuse of Authority 20292
2 Force 7636
1 Discourtesy 4677
3 Offensive Language 753
In [19]:
allegation_df = rank(df, 'allegation')
allegation_df.head(10)
Out[19]:
allegation count
57 Physical force 4849
114 Word 3942
97 Stop 2300
85 Search (of person) 2047
20 Frisk 1926
60 Premises entered and/or searched 1555
76 Refusal to provide name/shield number 1483
111 Vehicle search 1405
101 Threat of arrest 1370
112 Vehicle stop 1094

Next steps

  • a Twitter bot to report incidents, possibly with image/video
  • a site to look up these recent reports, combined with existing data